Linguistic Issues in Grace (Evaluation of Part-of-Speech Tagging for French)
نویسندگان
چکیده
GRACE is the first large-scale evaluation program of taggers for French. This experiment allowed to compare the assignments of Parts-of-Speech tags by various different taggers, on a common corpus of literary and journalistic texts. The evaluation relied on the acceptance by the participants of a reference formalism for morpho-syntactic description (the reference tagset) used by an expert to tag the evaluation corpus, and by the participants to provide a description (mapping table) of their own tagset. The global strategy was to make the reference tagging and tokenization of the finest grain possible. The reference tags were decomposed in Parts of Speech (main category) and lists of additional attributes, thus defining detailed syntactic patterns. The steps of the GRACE program are described, and the main adjudication issues are reported. The linguistic issues encountered during this experiment were linked to the difficulty to project relevant information about the sentence structure at the token level. Information derived from local analysis may be accepted as well as information derived from a larger context. Although the limitations of the experiment are acknowledged, the GRACE program proved to offer interesting opportunities to assess the state of the art in PoS tagging.
منابع مشابه
DisMo: A Morphosyntactic, Disfluency and Multi-Word Unit Annotator. An Evaluation on a Corpus of French Spontaneous and Read Speech
We present DisMo, a multi-level annotator for spoken language corpora that integrates part-of-speech tagging with basic disfluency detection and annotation, and multi-word unit recognition. DisMo is a hybrid system that uses a combination of lexical resources, rules, and statistical models based on Conditional Random Fields (CRF). In this paper, we present the first public version of DisMo for ...
متن کاملAn improved joint model: POS tagging and dependency parsing
Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...
متن کاملسیستم برچسب گذاری اجزای واژگانی کلام در زبان فارسی
Abstract: Part-Of-Speech (POS) tagging is essential work for many models and methods in other areas in natural language processing such as machine translation, spell checker, text-to-speech, automatic speech recognition, etc. So far, high accurate POS taggers have been created in many languages. In this paper, we focus on POS tagging in the Persian language. Because of problems in Persian POS t...
متن کاملTagging French Without Lexical Probabilities - Combining Linguistic Knowledge And Statistical Learning
This paper explores morpho-syntactic ambiguities for French to develop a strategy for part-of-speech disambiguation that a) reflects the complexity of French as an inflected language, b) optimizes the estimation of probabilities, c) allows the user flexibility in choosing a tagset. The problem in extracting lexical probabilities from a limited training corpus is that the statistical model may n...
متن کاملCombining Linguistic Knowledge and Statistical Learning in French Part-of-Speech Tagging
This paper presents a new part-of-speech tagger that takes into account both linguistic knowledge and statistical learning. Its novelty relies in several aspects: (a) a fully modular architecture that allows the user flexible use of each independent module, (b) an expanded tagset that gives the user the flexibility to use it directly or use any individually defined tag subset, (c) the exportabi...
متن کامل